npj Precision Oncology
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match npj Precision Oncology's content profile, based on 48 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Anthony, H.; Seoighe, C.
Show abstract
Subclonal diversity within a tumor is highly relevant for tumor evolution and treatment. This diversity is often referred to as intratumoral heterogeneity and is known to complicate the interpretation of single-test biomarkers. Microsatellite instability (MSI) is one such biomarker, which is used to guide immune check-point inhibitor treatment through the classification of samples as either having high microsatellite instability (MSI-H) or as being microsatellite stable (MSS). Although established as a therapeutic biomarker, it remains unclear whether MSI itself is a heterogeneous phenomenon. To investigate heterogeneity in MSI status, we integrated single-cell sequencing data from 134 samples across 49 individuals and developed a computational pipeline to infer MSI-H cells and quantify heterogeneity in MSI status. We found evidence of intratumoral heterogeneity in MSI both in individuals originally classified as MSI-H and MSS. Approximately a third of individuals showed evidence of divergence in MSI status between distinct clusters of cancer cells and most individuals had distinct MSI-H and MSS subclones. These results challenge the assumption that MSI should be treated as a binary biomarker and suggest the single-biopsy tests in current use could overlook a salient feature of this important molecular phenotype. Accounting for heterogeneity may lead to improved biomarker performance and, potentially, help explain reports of intrinsic treatment resistance and low overall response rate in MSI-H cancers. Further studies are warranted to determine the frequency of heterogeneity in MSI at the population level, and whether the presence of both MSI-H and MSS subclones can have clinical implications.
Zhang, Y.; Blomquist, T. M.; Kusko, R.; Stetson, D.; Zhang, Z.; Yin, L.; Sebra, R.; Gong, B.; LoCoco, J. S.; Mittal, V. K.; Novoradovskaya, N.; Yeo, J.-Y.; Dominiak, N.; Hipp, J.; Raymond, A.; Qiu, F.; Arib, H.; Smith, M. L.; Brock, J. E.; Farkas, D. H.; Craig, D. J.; Crawford, E. L.; Li, D.; Morrison, T.; Tom, N.; Xiao, W.; Yang, M.; Mason, C. E.; Richmond, T. A.; Jones, W.; Johann, D. J.; Shi, L.; Tong, W.; Willey, J. C.; Xu, J.
Show abstract
Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized normal cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. Tissue sections that failed more frequently showed low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces were more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, and the depth of formalin damage was related to fixation time. To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
Eckardt, J.-N.; Srivastava, I.; Schulze, F.; Winter, S.; Schmittmann, T.; Riechert, S.; Schneider, M.; Reichel, L.; Gediga, M. E. H.; Sockel, K.; Sulaiman, A. S.; Roellig, C.; Kroschinsky, F.; Asemissen, A.-M.; Pohlkamp, C.; Haferlach, T.; Bornhaeuser, M.; Wendt, K.; Middeke, J. M.
Show abstract
Evaluation of bone marrow morphology by experienced hematologists is key in the diagnosis of myeloid neoplasms, especially to detect subtle signs of dysplasia in myelodysplastic neoplasms (MDS). The majority of recently introduced deep learning (DL) models in cytomorphology rely heavily on manually drafted cell-level labels, a time-consuming, laborious process that is prone to substantial inter-observer variability, thereby representing a substantial bottleneck in model development. Instead, we used robust image-level labels for end-to-end DL and trained several state-of-the-art computer vision models on bone marrow smears of 463 patients with MDS, 1301 patients with acute myeloid leukemia (AML), and 236 bone marrow donors. For the binary classifications of MDS vs. donors and MDS vs. AML, we obtained an area-under-the-receiver-operating-characteristic (ROCAUC) of 0.9708 and 0.9945, respectively, in our internal test sets. Results were confirmed in an external validation cohort of 50 MDS patients with corresponding ROCAUC of 0.9823 and 0.98552, respectively. Explainability via occlusion sensitivity mapping showed high network attention on cell nuclei not solely of dysplastic cells. We not only provide a highly accurate model to detect MDS from bone marrow smears, but also underline the capabilities of end-to-end learning to solve the bottleneck of time-consuming cell-level labeling.
Zhao, C.; Jiang, T.; Ju, J. H.; Zhang, S.; Tao, J.; Fu, Y.; Lococo, J.; Dockter, J.; Powlowski, T.; Bilke, S.
Show abstract
BackgroundAs knowledge of mechanisms that drive the development of cancer grows, there has been corresponding growth in therapies specific to a mechanism. While these therapies show improvements in patient outcomes, they can be expensive and are effective only for a subset of patients. These treatments drive interest in research focused on the assignment of cancer therapies based on aberrations in individual genes or biomarkers that assess the broader mutational landscape, including microsatellite instability (MSI) and tumor mutational burden (TMB). MethodsHere we describe the TruSight Oncology 500 (TSO500; Research Use Only) bioinformatics workflow. This tumor-only approach leverages the next-generation sequencing-based assay TSO500 to enable high fidelity determination of DNA variants across 523 cancer-relevant genes, as well as MSI status and TMB in formalin-fixed paraffin-embedded (FFPE) samples. ResultsThe TSO500 bioinformatic workflow integrates unique molecular identifier (UMI)-based error correction and a dual approach variant filtering strategy that combines statistical modeling of error rates and database annotations to achieve detection of variants with allele frequency approaching 5% with 99.9998% per base specificity and 99% sensitivity in FFPE samples representing a variety of tumor types. TMB determined using the tumor-only workflow of TSO500 correlated well with tumor-normal (N =170, adjusted R2=0.9945) and whole-exome sequencing (N=108, adjusted R2=0.933). Similarly, MSI status determined by TSO500 showed agreement (N=106, 98% agreement) with a MSI-PCR assay. ConclusionTSO500 is an accurate tumor-only workflow that enables researchers to systematically characterize tumors and identify the next generation of clinical biomarkers.
Stockslager, M. A.; Malinowski, S.; Touat, M.; Yoon, J. C.; Geduldig, J.; Mirza, M.; Kim, A. S.; Wen, P. Y.; Chow, K.-H.; Ligon, K. L.; Manalis, S. R.
Show abstract
Functional precision medicine aims to match each cancer patient to the most effective treatment by performing ex vivo drug susceptibility testing on the patients tumor cells. Despite promising feasibility studies, functional drug susceptibility testing is not yet used in clinical oncology practice to make treatment decisions. Often, functional testing approaches have measured ex vivo drug response using metabolic assays such as CellTiter-Glo, which measures ATP as a proxy for numbers of viable cells. As a complement to these existing metabolic drug response assays, we evaluated whether biophysical assays based on cell mass (the suspended microchannel resonator mass assay) or size as measured by microscopy (the IncuCyte assay) could be used as a readout for ex vivo drug response. Using these biophysical assays, we profiled the ex vivo temozolomide responses of a retrospective cohort of 70 glioblastoma patient-derived neurosphere models with matched clinical outcomes and found that both biophysical assays predicted patients overall survival with similar power to MGMT promoter methylation, the clinical gold standard biomarker for predicting temozolomide response in glioblastoma. These findings suggest that biophysical assays could be a useful complement to existing metabolic approaches as "universal biomarkers" to measure sensitivity or resistance to anti-cancer drugs with a wide variety of cytostatic or cytotoxic mechanisms. One-sentence summaryBy using biophysical assays to perform ex vivo drug susceptibility testing on 70 glioblastoma patient-derived neurosphere models, we find that functional testing predicts the duration that patients survive when treated with temozolomide, the standard of care chemotherapy.
Swoboda, D. M.; DeZern, A. E.; England, J. T.; Venugopal, S.; Kehoe, T.; Aubrey, B. J.; Raddi, M. G.; Consagra, A.; Wang, J.; Andreadakis, J.; Rivero, G.; Stahl, M.; Zeidan, A. M.; Haferlach, T.; Brunner, A. M.; Buckstein, R.; Santini, V.; Della Porta, M. G.; Sekeres, M. A.; Nazha, A.
Show abstract
Background: Large language models (LLMs) perform well on standardized medical exam questions, but their reliability for complex hematology decision making is uncertain. We compared four general-purpose LLMs (GPT-4o, GPT-o3, Claude Sonnet 4, and DeepSeek-V3) with a Virtual MDS Panel (VMP), a coordinated multi-agent AI system in which domain-specialized, rule-bound software agents (WHO/ICC guidelines; IPSS-R/IPSS-M; NCCN) collaborate to generate tumor-board-level recommendations. Methods: Each model generated diagnostic, prognostic, and treatment recommendations for 30 myelodysplastic syndrome cases. Nine international MDS experts from five institutions, blinded to model identity, completed 3,000 structured ratings using 5-point Likert scales for diagnosis, prognosis, and therapy and classified errors by severity. Results: General-purpose LLMs achieved modest expert ratings (overall mean scores: 3.7 for GPT-o3, 3.2 for GPT-4o, 3.1 for DeepSeek, and 3.0 for Claude) and contained major factual errors in at least 24% of responses. The VMP increased the proportion of outputs rated 4 or higher to 87% (vs. 34-66% for general-purpose models), improved mean scores to 4.3 overall (4.3 for diagnosis, 4.4 for prognosis, and 4.1 for therapy), and reduced major errors to 8%. Conclusions: In this blinded evaluation of 30 complex MDS cases, general-purpose LLMs produced clinically important errors at rates that raise safety concerns for autonomous hematology decision making. The VMP, a rule-bound, multi-agent architecture, approached expert-level accuracy supporting its potential role as an effective decision-support tool for MDS in the future.
Harley, A.
Show abstract
Determining the primary site of origin for metastatic tumors is one of the open problems in cancer care because the efficacy of treatment often depends on the cancer tissue of origin. Classification methods that can leverage tumor genomic data and predict the site of origin are therefore of great value. Because tumor DNA point mutation data is very sparse, only limited accuracy (64.5% for 12 tumor classes) was previously demonstrated by methods that rely on point mutations as features (1). Tumor classification accuracy can be greatly improved (to over 90% for 33 classes) by relying on gene expression data (2). However, this additional data is often not readily available in clinical setting, because point mutations are better profiled and targeted by clinical mutational profiling. Here we sought to develop an accurate deep transfer learning and fine-tuning method for tumor sub-type classification, where predicted class is indicative of the primary site of origin. Our method significantly outperforms the state-of-the-art for tumor classification using DNA point mutations, reducing the error by more than 30% at the same time discriminating over many more classes on The Cancer Genome Atlas (TCGA) dataset. Using our method, we achieve state-of-the-art tumor type classification accuracy of 78.3% for 29 tumor classes relying on DNA point mutations in the tumor only.
R Rao, V.; Phung, T.; Sukhadia, S. S.
Show abstract
BRAF V600E mutations are critical oncogenic drivers in cutaneous melanoma, influencing treatment decisions and outcomes. However, conventional molecular assays face limitations, including tissue availability, cost, and access. To address this, we present an explainable deep learning model that predicts BRAF V600E mutation status directly from diagnostic whole-slide images (WSIs) of skin cutaneous melanoma. Using histopathological WSIs from The Cancer Genome Atlas (TCGA) and their corresponding mutation labels (BRAF wildtype vs. BRAF V600E), we trained a weakly supervised deep learning pipeline, XpressO, to identify tumor regions of interest (ROIs) predictive of BRAF mutation status. The model outputs attention heatmaps highlighting spatially relevant diagnostic features and computes a combined probability score from the top ten attention regions per WSI. These regions are further reviewed by a pathologist for biological appropriateness. On an independent test set, the model achieved an AUC of 0.79 with balanced precision and recall, correctly identifying 7 of 8 BRAF V600E mutant cases. This demonstrates the models ability to capture phenotypic correlates of mutation status and highlights the potential of computational pathology in precision oncology. Our approach offers a scalable, interpretable, and cost-effective alternative to molecular testing, particularly in resource-limited settings.
Chow, S.; Abelman, D. D.; Danesh, A.; Pedersen, S.; Nong-Wei, E.; Scott, D. S.; Suleman, A.; Roos, K.; Chen, C. I.; Berinstein, N.; Trudel, S.; Pugh, T. J.
Show abstract
PurposeWe assessed the utility of blood cell-free DNA (cfDNA) whole genome sequencing (cfWGS) for minimal residual disease (MRD) monitoring in Waldenstroms macroglobulinemia (WM) by comparing this to 1) targeted panel sequencing of 27 genes of interest in WM and targeted capture of immunoglobulin gene rearrangements in blood and bone marrow 2) Multiplex-PCR of immunoglobulin loci followed by Illumina sequencing (clonoSEQ). Experimental designSamples were collected from 7 patients on a clinical trial who were treated uniformly with chemoimmunotherapy and Brutons Tyrosine Kinase inhibitor (BTKi). Samples were collected prior to starting treatment and at clinical timepoints up to 18 months. MRD detection technologies were compared across all timepoints. ResultscfWGS was superior to both in-house targeted panel sequencing on cfDNA and clinical NGS in peripheral blood (PB) cells, using clinical bone marrow (BM) NGS as a standard. Tumor burden measured by cfWGS reflected MRD counts by clonoSEQ in BM. ConclusionscfWGS may be a valuable non-invasive alternative to bone marrow testing in WM patients who require close follow up and provides greater sensitivity than targeted panel sequencing of cfDNA. Statement of Translational RelevanceWhole-genome sequencing in cell-free DNA (cfWGS) is a highly sensitive marker of minimal residual disease that has application as a biomarker in clinical trials. cfWGS more accurately reflects bone marrow tumor burden than other available non-invasive measures to date. Further exploration is warranted to determine its full potential for use in cancer diagnostics and research.
Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.
Show abstract
BackgroundEarly-onset colorectal cancer (EOCRC), defined as diagnosis before age 50, is rising rapidly and disproportionately affects high-risk populations, particularly Hispanic/Latino (H/L) individuals, who experience the steepest increases in incidence and mortality. While prevention and screening strategies have curbed late-onset CRC rates, EOCRC remains outside standard screening guidelines and is projected to become the leading cause of cancer-related death in individuals aged 20-49 by 2030. FOLFOX (folinic acid, fluorouracil, and oxaliplatin) is a standard first-line therapy for microsatellite stable (MSS) CRC lacking actionable driver mutations; however, its efficacy and genomic impact in EOCRC, particularly in underrepresented groups, remain poorly understood. The phosphatidylinositol 3-kinase (PI3K) pathway regulates cell growth, survival, and metabolism, and its alterations have been implicated in therapeutic resistance and adverse outcomes. Yet, the prevalence, clinical relevance, and treatment-specific associations of PI3K pathway alterations in EOCRC remain underexplored. MethodsWe analyzed somatic mutation and clinical data from 2,515 CRC patients (266 H/L and 2,249 Non-Hispanic White [NHW]) across publicly available genomic datasets. Patients were stratified by age at diagnosis (EOCRC <50 vs. LOCRC [≥]50), ancestry (H/L vs. NHW), and FOLFOX treatment status. PI3K pathway alterations--including mutations in PIK3CA, PTEN, AKT isoforms, and regulatory genes--were identified using curated pathway definitions. Mutation prevalence was compared across groups using Fishers exact or chi-squared tests. AI-HOPE-PI3K, a conversational AI platform, was deployed to automate cohort construction, stratify subgroups, and perform post-hoc survival analysis. ResultsPI3K pathway alterations were observed across all demographic groups. In EO NHW patients treated with FOLFOX, Kaplan-Meier analysis revealed significantly reduced overall survival among those with PI3K pathway alterations (n = 124) compared with unaltered counterparts (n = 251; p = 0.0008), identifying alterations as a candidate prognostic biomarker in this subgroup. AI-guided subgroup interrogation further highlighted mutation-specific signals: INPP4B and RPTOR emerged as exploratory candidates in EO H/L patients but did not show significant treatment- or ancestry-specific enrichment upon confirmatory testing. Similarly, ancestry-stratified analysis of PIK3R2 mutations revealed comparable rates in EO H/L (1.37%) and EO NHW (1.6%) FOLFOX-treated patients (p = 1.0). Across ancestry and age groups, mutational landscape analysis revealed diverse molecular events--including missense, nonsense, splice-site, frameshift, and in-frame deletions--underscoring the heterogeneity of PI3K pathway dysregulation. ConclusionsThis study identifies PI3K pathway alterations as a potential prognostic marker of poor survival in EO NHW patients receiving FOLFOX and uncovers ancestry- and treatment-specific mutational differences in high-risk CRC populations. By integrating clinical, molecular, and treatment variables, the AI-HOPE and AI-HOPE-PI3K platforms enabled rapid, reproducible, and fine-grained analysis of complex datasets. These findings underscore the need for ancestry-informed molecular profiling to optimize therapeutic strategies and highlight AI-guided interrogation as a powerful tool for advancing precision oncology in underrepresented and disproportionately affected CRC populations.
Gwerder, M.; Demir, C. S.; Williams, H. L.; Lugli, A.; Martinez, C. G.; Kowal, j.; Khan, A.; Kirchner, P.; Koessler, T.; Berger, M. D.; Weigert, M.; Zlobec, I.
Show abstract
In rectal cancer, where part of the patients undergoes chemoradiotherapy, there is a need for improved pretreatment biomarkers applicable to biopsies. Tumor budding (TB) is a biomarker used in colon cancer, and due to its link to epithelial-mesenchymal transition (EMT), is hypothesized to be a potential marker for therapy resistance. Assessment of the utility of tumor buds in rectal biopsies is challenging due to their rarity. As EMT-related processes are also seen in other morphological features beyond tumor buds, we investigated EMT in tumor tissue including morphological features such as tumor cluster size and fibril-like structures. To do so, we leveraged a cohort of colon cancer whole-slide images and another cohort consisting of rectal cancer biopsies, visualized using hyperplex immunofluorescence to identify tumor and EMT-associated proteins. We built a custom image analysis pipeline to detect and segment tumor buds and other morphological features and correlated them with molecular expression intensities. We found strong correlations of EMT up-regulation and morphological transition states, both at the invasive margin and the tumor center. We furthermore observed a link between morpho-molecular transitions and histological growth patterns, which in turn can inform novel biomarkers. Finally, quantification of these morpho-molecular transition states in rectal biopsies showed their impact on survival after neoadjuvant chemoradiotherapy.
He, L.; Ren, Y.; Chen, H.; Guinn, D.; Parashar, D.; Chen, C.; Yuan, S.; Korostyshevskiy, V.; Beckman, R. A.
Show abstract
PURPOSEMolecular oncology determines biomarker-defined niche indications. Basket trials pool histologic indications sharing molecular pathophysiology, potentially improving development efficiency. Currently basket trials have been confirmatory only for exceptional therapies. Our previous randomized basket design may be generally suitable in the resource-intensive confirmatory phase, maintains high power, and provides nearly k-fold increased efficiency for k indications, but controls false positives for the pooled result only. Since false positive control by indications (FWER) may sometimes be required, we now simulate a variant of this basket design controlling FWER at 0.025k, the total FWER of k separate randomized trials. METHODSThe previous design eliminated indications at an interim analysis, conducting a final pooled analysis of remaining indications. To control FWER, we rechecked individual indications at a prospectively defined level of statistical significance after any positive pooled result. We simulated this modified design under numerous scenarios varying design parameters. Only designs controlling FWER and minimizing estimation bias were allowable. RESULTSSequential analyses (interim, pooled, and post-individual tests)) result in cumulative power losses. Optimal performance results when k = 3,4. We report efficiency (expected # true positives/expected sample size) relative to k parallel studies, at 90% power ("uncorrected") or at the power achieved in the basket trial ("corrected", because conventional designs could also increase efficiency by sacrificing power). Efficiency and power (percentage active indications identified) improve with higher percentage of initial indications active. Up to 92% uncorrected and 38% corrected efficiency improvement is possible, with power {approx} 60%. CONCLUSIONSEven under FWER control, randomized confirmatory basket trials substantially improve development efficiency. Initial indication selection is critical. The design is particularly attractive when enrollment challenges preclude full powering of individual indications.
Erkan, E. P.; Hämäläinen, E.; Kolikova, J.; Kuc, K.; Ojala, K.; Kukkonen, M.; Hermelo, I.; Koskensalo, S.; Tarvainen, T.; Leppä, A.; Keränen, I.; Haapamäki, C.; Karjalainen, E.; Carpelan-Holmström, M.; Renkonen-Sinisalo, L.; Koskenvuo, L.; Puolakkainen, P.; Pekka-Mecklin, J.; Viroläinen, E.; iCAN, ; Nykter, M.; Aaltonen, L.; Lepistö, A.; Ristimäki, A.; Seppälä, T. T.
Show abstract
Background & AimsPatients with colorectal cancer have heterogeneous clinical responses to chemotherapy, although clinical guidelines advise little variability in treatment selection based on molecular tumor features. Precision oncology research typically utilizes patient-derived tumor organoids (PDTO) to predict clinical outcomes, but such efforts are often not directed towards identification of molecular factors underlying differential responses to therapy. MethodsBulk RNA-sequencing was performed on treatment-naive PDTOs, and gene expression data was combined to drug sensitivity data to identify transcriptomic features associated with low in vitro sensitivity to chemotherapy. Whole-exome sequencing was performed on primary tumors to infer the somatic mutations of PDTOs and used to identify somatic mutations associated with differential in vitro drug responses. Publicly available gene expression and drug sensitivity data sets were used to validate the results. RNA interference was used for functional validation. ResultsPDTOs with low chemosensitivity had high JAK-STAT pathway activity resulting from high expression of interferon-stimulated genes. Evidence from single-cell RNA-sequencing confirmed chemotherapy-induced expression of interferon-stimulated genes in epithelial cells of cancers with partial response. EPSTI1 knockdown decreased cancer cell viability and sensitized cells to chemotherapy. ConclusionsSustained interferon signaling in epithelial cancer cells contributes to incomplete pathologic response in colorectal cancer. The findings highlight the potential of JAK-STAT inhibition or TRAIL pathway activation to enhance chemotherapy efficacy. Future studies investigating pharmacologic modulation of these pathways in preclinical CRC models are needed to determine their viability as therapeutic targets.
Hoang, D.-T.; Dinstag, G.; Hermida, L. C.; Ben-Zvi, D. S.; Elis, E.; Caley, K.; Sinha, S.; Sinha, N.; Dampier, C. H.; Beker, T.; Aldape, K.; Aharonov, R.; Stone, E. A.; Ruppin, E.
Show abstract
Advances in artificial intelligence have paved the way for leveraging hematoxylin and eosin (H&E)-stained tumor slides for precision oncology. We present ENLIGHT-DeepPT, an approach for predicting response to multiple targeted and immunotherapies from H&E-slides. In difference from existing approaches that aim to predict treatment response directly from the slides, ENLIGHT-DeepPT is an indirect two-step approach consisting of (1) DeepPT, a new deep-learning framework that predicts genome-wide tumor mRNA expression from slides, and (2) ENLIGHT, which predicts response based on the DeepPT inferred expression values. DeepPT successfully predicts transcriptomics in all 16 TCGA cohorts tested and generalizes well to two independent datasets. Importantly, ENLIGHT-DeepPT successfully predicts true responders in five independent patients cohorts involving four different treatments spanning six cancer types with an overall odds ratio of 2.44, increasing the baseline response rate by 43.47% among predicted responders, without the need for any treatment data for training. Furthermore, its prediction accuracy on these datasets is comparable to a supervised approach predicting the response directly from the images, trained and tested on the same cohort in cross validation. Its future application could provide clinicians with rapid treatment recommendations to an array of different therapies and importantly, may contribute to advancing precision oncology in developing countries. Statement of SignificanceENLIGHT-DeepPT is the first approach shown to successfully predict response to multiple targeted and immune cancer therapies from H&E slides. In distinction from all previous H&E slides prediction approaches, it does not require supervised training on a specific cohort for each drug/indication treatment but is trained to predict expression on the TCGA cohort and then can predict response to an array of treatments without any further training. ENLIGHT-DeepPT can provide rapid treatment recommendations to oncologists and help advance precision oncology in underserved regions and low-income countries.
Priyadarshi, S.; Mazumder, A. C.; Neekhra, B.; Biswas, S.; CHOWDHURY, D.; Gupta, D.; Haldar, S.
Show abstract
BackgroundAccurate quantification of cancer hallmark activity is essential for understanding tumor progression, tailoring treatments, and improving patient outcomes. Traditional methods, such as histopathological grading and immunohistochemistry for protein expression, often overlook the complex interplay between cancer cells and the tumor microenvironment and provide limited insight into hallmark-specific mechanisms. We aimed to develop OncoMark, a high-throughput deep learning-enabled neural multi-task learning framework capable of systematically quantifying integrative hallmarks activities using transcriptomics data from routine tumor biopsies. MethodsIn this study, we acquired single-cell transcriptomics data from 941 tumor samples across 14 tissue types, comprising nearly 3.1 million cells from 56 studies conducted worldwide, to form a large multicenter dataset. Our model employs a supervised neural multi-task learning method designed to predict multiple cancer hallmarks present in the biopsy samples simultaneously. The OncoMark model was developed and tested on 90% of the studies (patients from 51 studies) using repeated five-fold cross-validation performed twice. For further evaluation, the model was assessed on the remaining 10% of the studies (patients from 5 studies) that were excluded from the initial training and testing dataset. Additionally, we included patients from publicly available datasets, including TCGA, GTEx, ANTE, MET500, POG570, CCLE, TARGET, and PCAWG to validate the models performance. The primary objective was to evaluate the performance of the model in identifying cancer hallmarks in cancer datasets and ensure no hallmark predictions were made in normal samples across the four prespecified groups: (i) internal test set, (ii) external test set, (iii) normal samples (real-world), and (iv) cancer samples (real-world). FindingsOncoMark demonstrated exceptional performance in predicting cancer hallmark states, achieving near-perfect accuracy across internal test data and five external test datasets. Internal testing consistently showed accuracy, precision, recall, and F1 scores exceeding 99%, underscoring the models reliability across hallmarks. External test further confirmed these findings, with accuracy, precision, recall, F1 scores, and balanced accuracy consistently exceeding 96{middle dot}6%, and multiple datasets achieving perfect scores, highlighting the models exceptional generalizability and robustness. Specificity tests using GTEx and ANTE datasets accurately classified normal tissues, while sensitivity analysis on TCGA, MET500, CCLE, TARGET, PCAWG, and POG570 datasets effectively identified cancer hallmarks. InterpretationWe developed an AI-based framework that enables accurate, efficient, and cost-effective quantification of cancer hallmark activity directly from transcriptomics data. The framework demonstrated significant potential as an assistive tool for guiding personalized treatment strategies and advancing the clinical management of cancer patients. FundingAshoka University, S.N. Bose National Centre for Basic Sciences, Mphasis F1 Foundation, DST SERB Core Research Grant. Research in ContextO_ST_ABSEvidence before this studyC_ST_ABSWe conducted an extensive literature search using Google Scholar and PubMed without language restrictions, employing search terms such as "(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI)." While there have been advancements in molecular oncology and computational methodologies over the two decades since the concept of cancer hallmarks was first introduced, a comprehensive machine learning or deep learning framework to annotate all cancer hallmarks simultaneously from tumor biopsy samples remains to be developed. Additionally, the scarcity of hallmark-annotated datasets has posed a significant challenge, hindering the development of robust predictive models. Added value of this studyThis study introduces OncoMark, a novel high-throughput neural multi-task learning (N-MTL) framework designed to predict all cancer hallmark activities simultaneously from biopsy samples. OncoMark addresses the lack of annotated hallmark-specific data by generating synthetic biopsy (pseudo-bulk) datasets annotated with hallmark activity, meticulously modeled to reflect real-world tumor biology while maintaining clinical relevance. The framework employs a multi-task learning approach to capture interdependencies among hallmarks, advancing beyond isolated predictions to offer a holistic view of tumor biology. Validation on five independent datasets comprising 95 patient samples demonstrated its generalizability and reproducibility. Further external validation using eight datasets, encompassing over 11,679 cancer and 8348 normal patient samples, reinforced its robustness. To promote clinical integration, a user-friendly web-based tool was developed, enabling seamless access for oncologists and researchers. Implications of all the available evidenceThe OncoMark framework represents a transformative advancement in cancer diagnostics and treatment planning. By enabling accurate and reproducible prediction of all hallmark activities simultaneously from biopsy samples, this model paves the way for precision oncology at scale. Its ability to systematically capture hallmark interdependencies provides deeper insights into tumor behavior, guiding the development of individualized targeted therapies. The incorporation of a web-based interface ensures the accessibility of this innovation to clinicians worldwide, bridging the gap between computational oncology and clinical practice. Following further validation and integration into healthcare workflows, OncoMark has the potential to improve cancer outcomes by delivering timely, cost-effective, and precise tumor analyses, facilitating informed therapeutic decision-making with unparalleled precision. Cancer progression is driven by a set of well-defined biological principles--collectively termed the "hallmarks of cancer"--yet current diagnostic approaches seldom incorporate these distinct molecular features into clinical practice. Despite substantial progress in molecular oncology, traditional methods like histopathological grading and immunohistochemical assays often fail to capture the complex interplay between cancer cells and the tumor microenvironment, emphasizing the need for robust computational frameworks capable of systematically quantifying hallmark-specific activity. Here, we address this gap by developing OncoMark, a high-throughput neural multi-task learning (N-MTL) framework designed to simultaneously quantify hallmark activities in tumor biopsies using transcriptomics data. We show that OncoMark achieves near-perfect accuracy, precision, recall, and F1 scores (>99%) in cross-validation, with external validation consistently exceeding 96.6% on five independent datasets. Further evaluation on eight additional datasets--including large-scale cancer cohorts (TCGA, MET500, CCLE, TARGET, PCAWG, POG570) and normal tissue datasets (GTEx, ANTE)--demonstrated high specificity for normal samples and robust sensitivity for hallmark prediction in cancer. By delivering a comprehensive and cost-effective molecular portrait of tumor biology and providing a user-friendly web platform accessible at https://oncomark-ai.hf.space/, OncoMark has the potential to guide tailored treatment strategies and advance precision oncology. More broadly, this framework signifies a transformative step toward routine hallmark-based diagnostics, promising to improve patient outcomes by facilitating timely and precise tumor profiling.
Carta, M. G.; Angeloni, M.; Toegel, L.; Schubart, C.; Hoelsken, A.; Stoehr, R.; Vatrano, S.; Rizzi, D.; Magni, P.; Fraggetta, F.; Hartmann, A.; Haller, F.; Ferrazzi, F.
Show abstract
Molecular Tumour Boards (MTBs) rely on different bioinformatics tools and knowledgebases for variant annotation, oncogenicity classification, and estimation of complex biomarkers to identify actionable alterations. However, the typical bioinformatics workflow to process raw next-generation sequencing (NGS) data into clinically meaningful variants involves multiple steps and is inherently complex, thus requiring repeated manual intervention and causing delays in providing molecularly informed precision oncology. Here, we aimed at overcoming these limitations by developing a fully-automated integrative workflow to support NGS-based analyses within MTBs. Our workflow was established at the Institute of Pathology, University Hospital Erlangen (Germany), and adapted to the fully digitized Pathology department at Gravina Hospital in Caltagirone (Italy), using the Illumina TruSight Oncology 500 HRD assay as case study. A trigger event initiates all the downstream bioinformatics analyses to support variant interpretation. In Erlangen, the trigger event is the automatic detection of new NGS data on the Illumina Connected Analytics cloud-based platform. In Caltagirone, the analyses are manually triggered from the anatomic pathology laboratory information system (AP-LIS). The workflow automatically: (i) generates an intuitive overview of sequencing quality metrics, (ii) performs variant annotation, (iii) classifies variant oncogenicity through a fully-automated implementation of the ClinGen/CGC/VICC guidelines, and (iv) generates homologous recombination deficiency scores with genomic instability plots. In the digitized pathology department, results can be readily opened from the AP-LIS and visualized in the patient gallery. Taken together, our end-to-end fully-automated workflow streamlines NGS-based analyses within MTBs by integrating variant interpretation, oncogenicity classification, and estimation of clinically relevant biomarkers.
Abel, J.; Jain, S.; Rajan, D.; Padigela, H.; Leidal, K.; Prakash, A.; Conway, J.; Nercessian, M.; Kirkup, C.; Javed, S. A.; Egger, R.; Trotter, B.; Gerardin, Y.; Brosnan-Cashman, J. A.; Dhoot, A.; Montalto, M. C.; Wapinski, I.; Khosla, A.; Drage, M. G.; Yu, L.; Taylor-Weiner, A.
Show abstract
While alterations in nucleus size, shape, and color are ubiquitous in cancer, comprehensive quantification of nuclear morphology across a whole-slide histologic image remains a challenge. Here, we describe the development of a pan-tissue, deep learning-based digital pathology pipeline for exhaustive nucleus detection, segmentation, and classification and the utility of this pipeline for nuclear morphologic biomarker discovery. Manually-collected nucleus annotations were used to train an object detection and segmentation model for identifying nuclei, which was deployed to segment nuclei in H&E-stained slides from the BRCA, LUAD, and PRAD TCGA cohorts. Interpretable features describing the shape, size, color, and texture of each nucleus were extracted from segmented nuclei and compared to measurements of genomic instability, gene expression, and prognosis. The nuclear segmentation and classification model trained herein performed comparably to previously reported models. Features extracted from the model revealed differences sufficient to distinguish between BRCA, LUAD, and PRAD. Furthermore, cancer cell nuclear area was associated with increased aneuploidy score and homologous recombination deficiency. In BRCA, increased fibroblast nuclear area was indicative of poor progression-free and overall survival and was associated with gene expression signatures related to extracellular matrix remodeling and anti-tumor immunity. Thus, we developed a powerful pan-tissue approach for nucleus segmentation and featurization, enabling the construction of predictive models and the identification of features linking nuclear morphology with clinically-relevant prognostic biomarkers across multiple cancer types.
Kim, J.; Ye, S.; Kwak, J.-M.; Choi, D.; Kim, S.; Jeong, H. J.; Hong, E.; Lee, J. W.; Kim, S.; Won, Y.-H.; Koo, S. S.; Lee, I. S.; Park, T.; Yoon, J. B.; Oh, H.; Lee, Y. J.; Ahn, S.-J.; Kim, J.-S.; Kim, H.-K.; Cho, H.-W.; Lee, S.; Hong, J.; Razavi, P.; Kim, J.; Hur, J. W.
Show abstract
BackgroundCirculating tumor DNA (ctDNA) detection after curative-intent surgery is being used to identify minimal residual disease (MRD) in colorectal cancer (CRC). However, MRD classification is dependent on analytical sensitivity, and the impact of detection threshold on observed post-operative positivity remains incompletely characterized. We evaluated MRD positivity in stage I-III CRC using a CRISPR-based plasma sequencing assay, MUTE-Seq. MethodsPatients were prospectively enrolled and analyzed using customized tumor-informed panels applied to baseline and post-operative plasma samples collected at 4-week and 3-month. We report preliminary results from 39 plasma samples obtained from the first 14 patients. MRD positivity was assessed across multiple hypothetical detection thresholds (1-100 ppm). ResultsAll 14 patients (100%) had detectable mutations at baseline. Mutation-positive call number significantly decreased after surgery (baseline vs 4-week, p = 0.006; baseline vs 3-month, p = 0.004), and ctDNA concentration likewise declined (baseline vs 4-week, p = 0.002; baseline vs 3-month, p = 0.003). Among stage II-III patients, MRD positivity at 4-week was 20% at a 100-ppm threshold but increased to 70% at 10 ppm and 100% at 1 ppm. At 3-month, MRD positivity was 11% at a 100-ppm threshold and 78% at 1 ppm. At both time points, approximately 80% of MRD-positive stage II-III patients harbored ctDNA levels below 100 ppm, and half of these cases were below 15 ppm. Two patients (one stage I and one stage II) developed recurrence; both were MRD-positive at 4-week and demonstrated increasing mutation-positive calls at 3-month, with a median radiologic lead time of 4 months. ConclusionsPost-operative MRD classification in CRC is strongly influenced by analytical sensitivity. A substantial proportion of residual disease signals reside below the conventional ctDNA detection threshold of 100 ppm, supporting the clinical relevance of ultrasensitive ctDNA detection.
Hoang, D.-T.; Shulman, E. D.; Dhruba, S. R.; Nair, N. U.; Barman, R. K.; Lalchungnunga, H.; Singh, O.; Nasrallah, M. P.; Stone, E. A.; Aldape, K.; Ruppin, E.
Show abstract
Precision oncology is becoming increasingly integral to clinical practice, demonstrating notable improvements in treatment outcomes. While molecular data provide comprehensive insights, obtaining such data remains costly and time-consuming. To address this challenge, we developed Path2Omics, a deep learning model that predicts gene expression and methylation from histopathology for 23 cancer types. Path2Omics was trained on 20,497 slides (9,456 formalin-fixed and paraffin-embedded (FFPE) and 11,041 fresh frozen (FF)) from 8,007 patients across 23 The Cancer Genome Atlas cohorts. When tested on FFPE slides, the most readily available format in clinical pathology practice, the integrated model outperformed its individual FF and FFPE components, robustly predicting nearly 5,000 genes on average, approximately five times more than our recently published DeepPT model. Externally evaluated on seven independent cohorts, Path2Omics robustly predicted the expression of approximately 4,400 genes, yielding a 30% increase over the FFPE model alone. Finally, we demonstrate that the inferred gene expression is nearly as effective as the actual values in predicting patient survival and treatment response. These results lay the basis for using Path2Omics to advance precision oncology from histopathology slides in a speedy and cost-effective manner. Statement of significancePath2Omics is a deep learning model that accurately predicts gene expression and methylation from histopathology slides across 23 cancer types. Unlike existing approaches that rely solely on FFPE slides for training, Path2Omics leverages both FFPE and FF slides by constructing two separate models and integrating them. Downstream analyses show that the inferred values from Path2Omics are nearly as effective as actual values in predicting patient survival and treatment response.
Cho, Y.; Lee, J. W.; Shin, S. M.; Hernandez, A. G.; Yuan, X.; Schneider, J.; Hooper, J. E.; Wood, L. D.; Jaffee, E. M.; Deshpande, A.; Ho, W. J.
Show abstract
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer, with liver metastases significantly worsening outcomes. However, distinct features of the tumor microenvironment (TME) between primary and metastatic sites remain poorly defined. Cellular neighborhoods within the TME are recognized as functional units that influence tumor behavior. Conventional spatial methods, which assign equal weights to all cells in a region, fail to capture the nuances of cellular interactions. To address this, we developed Functional Cellular Neighborhood (FunCN) quantification, which integrates both the proportion and proximity of surrounding cells. Applying FunCN to PDAC imaging mass cytometry data, we identified neutrophil-enriched interactions in liver metastases compared to primary tumors, correlating with elevated VISTA expression by tumor cells. Additionally, FunCN clusters around CD8+ T cells in pancreas and liver were associated with higher TIGIT and LAG3, respectively. These findings demonstrate the importance of spatial immune landscapes in PDAC and identify potential therapeutic opportunities.